Take-home Ex03

Author

Li Ziyi

Published

February 7, 2023

Modified

February 25, 2023

1 Overview

For this assignment, the salient patterns of the resale prices of public housing properties by residential towns and estates in Singapore will be explored using dataset taken from Data.gov.sg.

If you would like to read the conclusion straight, please click Section 4 to reach the bottom of this study.

1.1 Loading libraries

For this exercise,

  • tidyverse is the main package to be used for data processing

  • DT is the package to be used for interactive data preview

  • ggstatsplot is the packge to be used for statistical analysis and visualisation

  • ggiraph, gganimate and gifski are packages to enable interactive data visualisation

  • Other packages are for make-up mainly

pacman::p_load(tidyverse,
               DT,
               ggstatsplot,
               ggiraph,
               gganimate,
               gifski,
               ggthemes)

1.2 Data loading

flat_full <- read_csv("Data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
DT::datatable(flat_full,
              class ="cell-border stripe")

Through the preview of the dataset, certain data processing would be beneficial for easier analysis later on. E.g. the ‘month’ column will be split into two columns ‘year_’ and ‘month_’. Information like price per square meter and remaining number of years on the lease will also be calculated.

1.3 Data processing

flat_temp <- flat_full %>%
  separate(month,
           into = c("year_", "month_"),
           sep = "-",
           convert = TRUE) %>% 
  mutate(price_psqm = resale_price / floor_area_sqm)

flat_temp["rem_lease_yrs"] = (99 - (2022 - flat_temp$lease_commence_date))
DT::datatable(flat_temp,
              class ="cell-border stripe")

2 Selection and design consideration

p1 <- ggplot(flat_temp,
           aes(x = factor(year_),
               fill = flat_type)) +
  geom_bar_interactive(aes(tooltip = flat_type)) +
  labs(title = "Number of resale flat transactions from 2017 to 2023",
       x = "Year",
       y = "Number of transactions",
       fill = "Flat type") + 
  theme_economist() +
  theme(axis.title.y = element_text(vjust = 2.5,
                                    size = 12),
        axis.title.x = element_text(vjust = -2.5,
                                    size = 12))

girafe(
  ggobj = p1,
  width_svg = 12,
  height_svg = 6
)

Figure 1: Resale flat transactions

From the visualisation above, the top three resale transactions came from 3, 4 and 5-room flats each year. Since 2023 has just started, the main focus of this study would zoom into resales transactions from 3, 4 and 5-room flats in the year of 2022.

flat_2022_3types <- flat_temp %>% 
  filter(year_ == "2022",
         flat_type %in% c("3 ROOM", "4 ROOM", "5 ROOM")) %>% 
  mutate(mth_abb = month.abb[month_])

The resale transaction price and its trend back in 2022 would be studied. To have a more meaningful comparison, median values would be used for most of studies below, instead of average values. This is to avoid the sensitivity impact driven by the extreme values on average values. Furthermore, price per square meters (psm) would be calculated to normalise for some comparison across different flat types. From there, some of factors like town location, storey and their impact on the unit price would be investigated.

3 Visualisation and observation

3.1 Three-sample mean test

ggbetweenstats(
  data = flat_2022_3types,
  x = flat_type,
  y = resale_price,
  type = "np",
  plot.type = "boxviolin",
  title = "Non-parametric mean test for 3, 4 & 5-room HDB",
  xlab = "Flat type",
  ylab ="Resale price") +
  theme_economist() +
  theme(axis.title.y = element_text(vjust = 2.5))

Figure 2: three-sample mean test

Using a non-parametric test, from the result p = 0 < 0.05, it can be concluded that the resale price distribution does not follow a normal distribution.

3.2 Median resale price trend in 2022

To look at the resale price trend in 2022, 2022 resale transaction data is grouped by flat type and month for easier visualisation.

flat_by_mth <- flat_2022_3types %>% 
  group_by(flat_type, month_) %>% 
  summarise(total_sales = n(),
            median_sales_price = median(resale_price),
            median_house_size_sqm = median(floor_area_sqm),
            median_remaining_lease_yrs = median(rem_lease_yrs),
            max_sales_price = max(resale_price),
            min_sales_price = min(resale_price)) %>% 
  arrange(desc(total_sales))
flat_by_mth$tooltip <- c(paste0(
  "Flat type: ", flat_by_mth$flat_type,
  "\n Median resales price: ", flat_by_mth$median_sales_price
))

p2 <- ggplot(data = flat_by_mth,
             aes(x = month_, 
                 y = median_sales_price, 
                 colour = flat_type)) +
        geom_point_interactive(aes(tooltip = flat_by_mth$tooltip)) +
        geom_smooth() +
        scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) +
        scale_x_continuous(breaks = seq_along(month.abb),
                           labels = month.abb) +
        theme_economist() +
        theme(axis.title.y = element_text(vjust = 2.5),
              legend.position = "right") +
  labs(x = NULL,
       y = "Median resale price") +
  ggtitle("Median resale price trend in 2022 for 3, 4 and 5 ROOM HDB")


girafe(
  ggobj = p2,
  width_svg = 12
)

Figure 3: median resale price trend

Based on the point plot, it is observed that the median resale transaction price for all three flat types has been on an increasing trend throughout the whole year of 2022. This indeed aligns with my experience as a house tenant for the year, during which most of room rental or purchase contracts have seen significant surges.

Besides, the price increment from adding one more room is roughly $100,000.

3.3 Resale price psm by town for each flat type

To visualise the price distribution by towns, a boxplot is chosen here to present summary statistics all in one panel.

ggplot(flat_2022_3types,
       aes(x = town, 
           y = price_psqm,
           color = town)) +
  geom_boxplot() +
  facet_wrap(~ flat_type, nrow = 3) +
  labs(title = "Flat Resale Price (psm) by Town for 3, 4, 5-room HDB flats",
       x = NULL,
       y = "Resale Price ($ psm)") +
  guides(color = FALSE) +
  theme_economist() +
  theme(plot.title = element_text(size=16, hjust=0.5),
        axis.text.x = element_text(vjust = 0.5,
                                   angle = 60,
                                   size = 8),
        axis.title.y = element_text(size=15,
                                    vjust = 2.5),
        panel.spacing = unit(3, "line"))

Figure 4: resale price psm by town

Using the chart, the resale price psm for one town remain largely the same across three flat types.

For 3-room type, the median resale price psm are rather close for all towns. The maximum resale price psm for 3-room type was seen in Bukit Merah area and the minimum resale unit price was most likely seen in Toa Payoh.

For 4-room type, the median resale price psm in Central area and in Queenstown were rather higher than most of the rest areas.

For 5-room type, the median resale price psm in Central area were extremely high, while the rest of areas’ were relatively close.

What’s more, the price psm for each flat type could vary widely even within the same area, as can be seen from the difference between max and min values.

3.4 Median resales price psm by town for each storey range

To visualise the potential impact from the location (town) and the storey range, a heatmap is chosen here to cross examine multivariate data. This is because heatmaps are generally good for showing variance across multiple variables to reveal patterns.

flat_heatmap <- flat_2022_3types %>% 
  group_by(town, storey_range) %>% 
  summarise(total_sales = n(),
            median_sales_price = median(resale_price),
            median_house_size_sqm = median(floor_area_sqm),
            median_remaining_lease_yrs = median(rem_lease_yrs)) %>% 
  arrange(desc(total_sales))
heatmap <- ggplot(data = flat_heatmap, 
                  mapping = aes(x = town, 
                                y = storey_range,
                                fill = median_sales_price)) +
            geom_tile() +
  labs(title = "Median resales price per square meter by town and storey for 3, 4 & 5-Room HDB flat", 
       x = NULL, 
       y = "Storey") +
  scale_fill_gradient(name = "Price per square meter",
                      low = "#F2F2F2",
                      high = "#00532F")+
  theme_economist() +
  theme(axis.text.x = element_text(angle = 60,
                                   vjust = 0.5,
                                   size = 8),
        axis.title.y = element_text(vjust = 2.5),
        legend.position = "right",
        legend.text = element_text(size = 6),
        legend.title = element_text(size = 8))

heatmap

Figure 5: heatmap by town and storey range

Based on the heatmap produced, it is generally true that higher storey units tends to have a better price psm, as illustrated from the darker green on higher storeys.

However, for some unpopular towns like Choa Chu Kang, Woodlands, Jurong West, Pasir Ris, Punggol, Sembawang and Yishun, the price psm difference do not seem significant between higher and lower storey units.

Additionally, it can be observed that resale flats in Bukit Timah and Yishun are much lower rised compared to the rest of towns. Especially, the highest unit observed in Bukit Timah was only 15-storey. On the other hand, resales flats in Central Area are the highest with storey higher than 50 available. Besides, the price psm for 30-stroey in Central Area were mostly highest comparing to other towns.

3.5 Resales price per psm against remaining lease years throughout 2022

The last factor that is being looked into is the the number of years remained on the lease. To isolate the potential impact from different month of the year (e.g. bonus payout, school holidays or holiday seasons), a scatter plot enabled by animation of transition of months is presented below.

p3 <-
ggplot(flat_2022_3types,
       aes(x = rem_lease_yrs,
           y = price_psqm,
           colour = flat_type)) +
  geom_point() +
  labs(title = "Resale price per square meter against remaining lease years",
       x = "Remaining Lease (Years)",
       y = "Resale price ($ psm)",
       fill ="Flat type",
       caption = "Month of 2022: {frame_time}") +
  theme_economist() +
  geom_smooth(method="lm",
              se = FALSE,
              color = "mediumorchid1",
              formula = y ~ x) +
  theme(plot.title = element_text(size = 12,
                                  hjust = 0),
        axis.title.x = element_text(size = 12,
                                    vjust = 0.5),
        axis.title.y = element_text(size = 12,
                                    vjust = 2.5),
        legend.position = "none",
        panel.spacing = unit(2, "line")) +
  facet_grid(flat_type ~ .) +
  transition_time(flat_2022_3types$month_) +
  ease_aes('linear')

animate(p3,
        nframes = 12,
        fps = 0.5)

Figure 6: scatterplot animation by month

According to the chart, throughout 2022 from Janurary to December, it’s certain that the resale price psm increases along the number of years remaining on the lease.

What is also observed was that most of resale transactions for those with less than 50 years on the lease were 3-room flat mainly.

Furthermore, relatively more transactions are seen particularly for those with the remaining lease above than 90 years.

4 Insights and thoughts

i. An increasing trend has been observed from 3, 4 and 5-room HDB resale price in 2022 (see Figure 3). No sign of a change in the trend observed from the data as well as the macro environment, based on the daily news.

ii. Some factors to consider for purchasing a resale 3, 4 or 5-room HDB flat:

  • Central Area, Kallang and Queenstown are the top most expensive areas based on the resale transactions data in 2022 (see Figure 4).

  • High storey units may not be available in all areas.

  • Relative lower price differentiation due to storey range difference could be found in certain areas, but most of them are unpopular ones (see Figure 5). This might be due to the fact that these areas are closer to the edge of Singapore. The inaccessibility and inconvenience for commuting to work and leisure activities could’ve led to a lower demand.

iii. Resale 3, 4 & 5-room HDB flats price psm is positively correlated with number of years remaining on the lease.

  • Resale activities are quite active for lease remaining more than 90 years. This is rather interesting. One of my immature guess is that this is the result from young couples whose BTOs have reached the minimum occupation period (MOP) and therefore, have traded their BTO. To secure a satisfying profit from the ‘discount’ they were able to enjoy from the government BTO policy, those new units were therefore active in the resale market.